Method and system for document image layout deconstruction and redisplay system

ABSTRACT

The invention converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or “re-flowing” of the document to fit an arbitrarily sized display device.

BACKGROUND OF THE INVENTION

[0001] 1. Field of Invention

[0002] The invention relates generally to the problem of making anarbitrary document, conveniently readable on an arbitrarily sizeddisplay.

[0003] 2. Description of Related Art

[0004] Existing systems for rendering page-image versions of documentson display screens have required manual activities to improve therendering, or clumsy panning mechanisms to view direct display of pageimages on wrong-sized surfaces. In particular, it has been necessary toeither (1) key in the entire text manually, or (2) process the pageimages through an optical character recognition (OCR) system and thenmanually tag the resulting text in order to preserve visually importantlayout features.

[0005] Problems with existing systems include: (a) high expense ofmanual keying and/or correcting of OCR results and manual tagging; (b)the risk of highly visible and disturbing errors in the text resultingfrom OCR mistakes; and (c) the loss of meaningful or aestheticallypleasing typeface and type size choices, graphics and other non-textelements; and (d) loss of proper placement of elements on the page.

[0006] Such problems are significant, for example, because bookpublishers are increasingly creating page-image versions of bookscurrently being published, as well as books from their backlists. Thepage-image versions are being created for print-on-demand usage. Whileprint-on-demand images can be re-targeted to slightly larger or slightlysmaller formats by scaling the images, they cannot currently be re-usedfor most electronic book purposes without either re-keying the book intoXML format, or scanning the page images using OCR and manuallycorrecting the re-keyed and scanned images.

SUMMARY OF INVENTION

[0007] The invention provides methods and systems for converting anydocument originating in a page-image format, such as a scanned hardcopydocument represented as a bitmap, into a form suitable for display onscreens of arbitrary size, through automatic reformatting or “reflowing”of document contents.

[0008] Reflowing is a process that moves text elements (often words)from one text-line to another so that each line of text can be containedwithin given margins. Reflowing typically breaks or fills lines of textwith words, and may re-justify column margins, so that the full width ofa display is used and no manual ‘panning’ across the text is needed. Asan example, as a display area, within which lines of text appear, isaltered so that the width of the visible text is reduced, it may benecessary for words to be moved from one text-line to another to shortenthe length of all of the text-lines so that no text-line is too long tobe entirely visible in the display area. Conversely, if the display areais widened, words may be moved from one text-line to another so that thelength of text-lines increase, thereby allowing more text-lines to beseen without any word image being obscured.

[0009] Image and layout analysis transforms the raw document image intoa form that is reflowable and that can be more compactly represented onhand-held devices. In various exemplary embodiments, image analysisbegins with adaptive thresholding and binarization. For each pixel, themaximum and minimum values within a region around that pixel, aredetermined using greyscale morphology. If the difference between thesetwo values is smaller than a statistically determined threshold, theregion is judged to contain only white pixels. If the difference isabove the threshold, the region contains both black and white pixels,and the minimum and maximum values represent the blank ink and whitepaper background values, respectively. In the first case, the pixelvalue is normalized by bringing the estimated white level to the actualwhite level of the display. In the second case, the pixel value isnormalized by expanding the range between the estimated white and blacklevels to the full range between the white level and the black level ofthe display. After this normalization process, a standard thresholdingmethod can be applied.

[0010] In the thresholded image, connected components are labeled usinga scan algorithm combined with an efficient union-find data structure.Then, a bounding box is determined for each connected component. Thisresults in a collection of usually several thousand connected componentsper page. Each connected component may represent a single character, aportion of a character, a collection of touching characters, backgroundnoise, or parts of a line drawing or image. These bounding boxes forconnected components are the basis of the subsequent layout analysis.

[0011] In various exemplary embodiments, for layout analysis, thebounding boxes corresponding to characters in the running text of thedocument, as well as in a few other page elements, such as, for example,headers, footers, and/or section headings, are used to provide importantinformation about the layout of the page needed for reflowing. Inparticular, the bounding boxes and their spatial arrangement identifypage rotation and skew, column boundaries, what tokens may be needed fortoken-based compression, reading order, and/or how the text should flowbetween different parts of the layout. Bounding boxes that are not foundto represent “text” in this filtering operation are not lost, however.Such bounding boxes can later be incorporated into the output from thesystem as graphical elements.

[0012] The dimensions of bounding boxes representing body text are foundusing a simple statistical procedure. Using the distribution of heightsas a statistical mixture of various components, for most pagescontaining text, the largest mixture component often corresponds tolower case letters at the predominant font size. The size is used tofind the x-height of the predominant font and the dimension is used tofilter out bounding boxes that are either too small or too large torepresent body text or standard headings.

[0013] Given a collection of bounding boxes representing text, it isdesirable to find text lines and column boundaries. The approach used invarious exemplary embodiments to identify text lines and columnboundaries relies on a branch-and-bound algorithm that finds maximumlikelihood matches against line models under a robust least square errormodel, i.e., a Gaussian noise model in the presence of spuriousbackground features. Text line models are described by three parameters:the angle and the offset of the line, and the descender height. Boundingboxes whose alignment point, that is, the center of the bottom side ofthe bounding box, rests either on the line or at a distance given by thedescender height below the line, are considered to match the line.Matches are penalized by the square of their distance from the model, upto a threshold value ε, which is usually on the order of five pixels.

[0014] After a text line has been found, the bounding box that boundsall of the connected components that participated in the match isdetermined. All other connected components that fall within thatbounding box are assigned to the same text line. This tends to “sweepup” punctuation marks, accents, and “i”-dots that would otherwise bemissed. Within each text line, multiple bounding boxes whose projectionsonto the baseline overlap are merged. This results in bounding boxesthat predominantly contain only or more complete characters, as opposedto bounding boxes that contain only or predominantly portions ofcharacters. The resulting bounding boxes are then ordered by thex-coordinate of the lower left corner of the bounding boxes to obtain asequence of character images in reading order. Multiple text lines arefound using a greedy strategy, in which the top match is firstidentified. Then, the bounding boxes that participated in the match areremoved from further consideration. Next, the next best text line isfound, until no good text line matches can be identified anymore.

[0015] This approach to text line modeling has several advantages overknown projection or linking methods. First, different text lines canhave different orientations. Second, by taking into account both thebaseline and the descender line, the technique can find text lines thatare missed by known text line finders. Third, the matches returned bythis method follow the individual text lines more accurately than otherknown methods.

[0016] Column boundaries are identified in a similar manner by findingglobally optimal maximum likelihood matches of the center of the leftside of bounding boxes against a line model. In order to reducebackground noise, prior to applying the line finder to column finding,statistics about the distribution of horizontal distances betweenbounding boxes are used to estimate the intercharacter and inter-wordsspacing, i.e., the two largest components in the statisticaldistribution of horizontal bounding box distances. The bounding boxesfor characters are then merged into words. This reduces severalfold thenumber of bounding boxes that need to be considered for column matchingand tends to improve the reliability of column boundary detection.

[0017] Any connected components that are not part of a text line aregrouped together and treated as images. For a single column document, byenumerating text lines and bounding boxes of images in order of theiry-coordinates, a sequence of characters, whitespaces, and images inreading order is obtained. For a double column document, the two columnsare treated as if the right column were placed under the left column.

[0018] This simple layout analysis technique copes with a large numberof commonly occurring layouts in printed documents and transform suchlayouts into a sequence of images that can be reflowed and displayed ona smaller-area display device. The simple technique works well in theseapplications because the requirements of reflowing for a smaller-areadisplay device, such as a document reader, are less stringent than forother layout analysis tasks, like rendering into a word processor. Sincethe output of the layout analysis will only be used for reflowing andnot for editing, no semantic labels need to be attached to text blocks.Because the documents are reflowed on a smaller area screen, there isalso no user expectation that a rendering of the output of the layoutanalysis precisely match the layout of the input document. Furthermore,if page elements, like headers, footers, and/or page numbers, areincorporated into the output of the layout analysis, users can easilyskip such page elements during reading. Such page elements may alsoserve as convenient navigational signposts on the smaller-area displaydevice.

[0019] In various exemplary embodiments, the methods and systemsaccording to this invention more specifically provide a two-stage systemwhich analyzes, or “deconstructs”, page image layouts. Suchdeconstruction includes both physical, e.g., geometric, and logical,e.g., functional, segmentation of page images. The segmented imageelements may include blocks, lines, and/or words of text, and othersegmented image elements. The segmented image elements are thensynthesized and converted into an intermediate data structure, includingimages of words in correct reading order and links to non-textual imageelements. The intermediate data structure may, for example, be expressedin a variety of formats such as, for example, Open E-book XML, Adobe™PDF 1.4 or later, HTML and/or XHTML, as well as other useful formatsthat are now available or may be developed in the future. In variousexemplary embodiments, the methods and systems according to thisinvention then distill or convert, the intermediate data structure for“redisplay” into any of a number of standard electronic book formats,Internet browsable formats, and/or print formats.

[0020] In various exemplary embodiments of the methods and systemsaccording to this invention, the intermediate data structure may containtags, such as those used in SGML and XML, which state the logicalfunctions or geometric properties of the particular image elements thetags annotate. It is also possible that, in various exemplaryembodiments, some image elements may not have tags attached to them. Forexample, in instances where the functions and properties of imageelements may be inferable from their position and the position of othertagged and untagged image elements in the intermediate data structure,such tags may not be necessary.

[0021] It is also possible that, in various exemplary embodiments,special image elements that can be used for this purpose are notextracted from the original page image, but are created as tagged oruntagged elements. Such special image elements can be inserted into theintermediate data structure in an order that would define the desiredfunctions and properties of other image elements. For example, a specialimage element may be a blank that represents a space between two words.Further, special non-image markers, other than tags attached toparticular image elements, could be inserted so that the functions andproperties of at least some of the image elements may be inferred fromtheir relative position with respect to the markers within theintermediate data structure.

[0022] To prepare the intermediate data structure for redisplay, theintermediate data structure may be converted, for example, to HTML foruse on a standard Internet browser, or to Open E-book XML format for useon an Open E-book reader. Other methods may include, for example,converting the intermediate data structure to Plucker format for use ona Plucker electronic book viewer, or to Microsoft Reader format fordisplay using MS Reader format or to a print format for printing topaper or the like.

[0023] In any document image, the physical layout geometry is fixed andthe logical or functional layout structure is implicit. That is, it isintended to be understood by human readers, who bring, to the task ofreading, certain conventional expectations of the meaning andimplications of layout, typeface, and type size choices. In variousexemplary embodiments, in the intermediate data structure according tothe methods and systems of this invention, by contrast, the originalfixed positions of words are noted but not strictly adhered to, so thatthe physical layout becomes fluid. In various exemplary embodiments,aspects of the logical structure of the document are capturedexplicitly, and automatically, and represented by additionalinformation. In various exemplary embodiments, the intermediate datastructure according to this invention is automatically adaptable at thetime of display to the constraints of size, resolution, contrast, color,geometry, and/or the like, of any given display device or circumstanceof viewing.

[0024] The adaptability enabled by the methods and systems according tothis invention include re-pagination of text, reflowing, such as, forexample, re-justification, reformatting, and/or the like, of text intotext-lines, and logical linking of text to associated text and/ornon-text contents, such as illustrations, figures, footnotes,signatures, and/or the like. In various exemplary embodiments, themethods and systems according to this invention take into accounttypographical conventions used to indicate the logical elements of adocument, such as titles, author lists, body text, paragraphs, and/orhyphenation, for example. In various exemplary embodiments, the methodsand systems of the invention also allow the reading order to be inferredwithin blocks of text and/or among blocks of text on the page.

[0025] Thus, redisplaying the document is enabled for a wide range ofdisplays whose size, resolution, contrast, available colors, and/orgeometries may require the document's contents to be reformatted,reflowed, re-colored, and/or reorganized to achieve a high degree oflegibility and a complete understanding of the document's contents,without requiring OCR or re-keying, and without being subject to therespective attendant errors of OCR or re-keying, and without losing thelook and feel of the original document as chosen by the author andpublisher.

[0026] In various exemplary embodiments, the methods and systemsaccording to this invention reduce costs by obviating the need formanual keying, correction of OCR results, and/or tagging. In variousexemplary embodiments, the methods and systems according to thisinvention tend to avoid introducing OCR character recognition errors. Invarious exemplary embodiments, the methods and systems according to thisinvention tend to preserve typeface and type size choices made by theoriginal author and publisher, which may be helpful, or even essential,in assisting the reader in understanding the author's intent. In variousexemplary embodiments, the methods and systems according to thisinvention also tend to preserve the association of graphics andnon-textual elements with related text.

[0027] These and other features and advantages of this invention aredescribed in, or are apparent from, the following detailed descriptionof various exemplary embodiments of the systems and methods according tothis invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] Various exemplary embodiments of the systems and methodsaccording to this invention will be described in detail, with referenceto the following figures, wherein:

[0029]FIG. 1 illustrates an intermediate representation of an image of apage, using XHTML;

[0030]FIG. 2 illustrates the format and content of the intermediaterepresentation without the use of tags or explicit separators;

[0031]FIG. 3 is a flowchart outlining one exemplary embodiment of amethod for document image layout deconstruction and redisplay;

[0032]FIG. 4 is a block diagram of one exemplary embodiment of adocument deconstruction and display system according to this invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0033]FIG. 1 illustrates a detailed example of an intermediate datastructure 260 for a page image 300. In FIG. 1 the intermediate datastructure 260 is expressed using XHTML as an example of an intermediatedata structure format. The page image 300 is shown schematically havinga first text area 310 which functions as a title, a second area 320which functions as an author list, third text areas 330 which functionas paragraphs, and a fourth text area 340 which functions as a pagenumber. The structures represented by these text areas 310-340 areusually significant to both the author and the reader, and so aredetected and preserved in the intermediate data structure 260. Forexample, the intermediate data structure 260 preserves the title textarea 310 by noting the position of this title text area 310 at the topof the page image, that the text area 310 is centered, and the largetypeface used in this text area 310. The position is preserved in theintermediate data structure 260 by the XHTML tag “<DIV CLASS=titleID=title>”. Also, the intermediate data structure 260 preserves theauthor-list text area 320 by the position, of this author-list text area320 just beneath the title text area 310. The intermediate datastructure 260 preserves the centered position of the author-list textarea 320, and that the author-list text area 320 is printed in a largetypeface that is smaller than the typeface of the title text area 310.In particular, in the specific exemplary embodiment shown in FIG. 3, theauthor-list text area 320 is preserved in the intermediate datastructure 260 by the XHTML tag “<DIV CLASS=authors ID=authors>”.

[0034]FIG. 2 shows a representation of the page image 300 as a sequenceof image elements 190, and the corresponding representative compressedimage tokens 200, without using attached tags or explicit separators.For example, in a document where the functions and properties of imageelements may be inferable from their position on the page and theposition of other tagged and untagged image elements in the intermediatedata structure, it is not necessary to tag all of the image elements.

[0035]FIG. 3 is a flowchart outlining one exemplary embodiment of amethod for document image layout deconstruction and redisplay. As shownin FIG. 3, operation of the method begins in step S100 and continues tostep S110, where a document is input by scanning, or use of another datasource that provides a document that is in a page image format. Thedocument may be represented as a set of page images, such as bi-level,gray-scale, or as color images, in one of a set of image file formatssuch as TIFF and JPEG, for example.

[0036] Then, in step S120, the image file of the page image is analyzedto identify text image areas and non-text image areas. Text area imagesmay include, for example, blocks (or columns), lines, words, orcharacters of text. Non-text area images may include, for example,illustrations, figures, graphics, line-art, photographs, handwriting,footnotes, signatures and/or the like.

[0037] Next, in step S130, the identified text image areas and non-textimage areas are located and isolated. Locating and isolating text imageareas may include, for example, locating and isolating the baseline and,possibly, top-line and/or cap-line, of each text line image. Theisolated line regions are modeled as line segments that run from one endof the text line image to another. Baselines may be modeled as straightlines which are horizontal or, in the case of Japanese, Chinese, andother scripts, vertical, or oriented at some angle near the horizontalor the vertical. Baselines may also be modeled as curved functions.Operation then continues in step S140.

[0038] In step S140, the isolated text image areas are selected forfurther processing. Next, in step S150, the text line regions of theselected text image areas are located and isolated and the layoutproperties of the selected text image areas are then determined. Layoutproperties may include, for example, indentation, left and/or rightjustification, centering, hyphenation, special spacing (e.g. for tabulardata), proximity to figures and other non-textual areas, and the like.Layout properties may also include type size and typeface-familyproperties (e.g. roman/bold/italic styles) that may indicate thefunction of the text within the page. Operation then continues in stepS160.

[0039] In step S160, the located text line regions are further processedinto a set of segmented image elements. Then in step S170, the segmentedimage elements are read and basic textual elements are located andisolated. Basic textual elements may include, for example, words,numbers, dates, proper names, bibliographic references, references tofigures, and/or other non-textual elements within or outside thedocument. The textual elements will become the basic image units whichwill be reflowed and reconstructed in later stages. As part of locatingthe segmented image elements, each segmented image element is labeledwith the position of the element relative to the baseline of the textline so that when the text-lines are later reflowed, the reconstructedbaseline may be referred to when placing the corresponding segmentedimage elements so the elements appear to share the newly constructedbaseline. Operation then continues to step S180.

[0040] In step S180, the set of segmented image elements are labeledwith their baseline-relative position. Next, in step S190, the segmentedimage elements and the relative baselines portions are compressed intotoken-based image elements. Then, in step S200, the image elements aresynthesized into an intermediate data structure. Operation thencontinues to step S210.

[0041] In step S210, the intermediate data structure is stored to retainthe data in an intermediate format until distilling and redisplay isdesired. Then, in step S220, the stored data is distilled to convert thedata into a device specific display format. The intermediate datastructure may be converted, for example, to HTML for use on a standardInternet browser, or to Open E-book XML format for use on an Open E-bookreader. Other methods may include, for example, converting theintermediate data structure to Plucker format for use on a Pluckerelectronic book viewer, or to Microsoft Reader format for display usingMS Reader format or to a print format for printing to paper or the like.Next, in step S230, the distilled data is displayed to the user.Operation of the method then continues to step S240, where operation ofthe method ends.

[0042] In various exemplary embodiments of this invention, theintermediate data structure may also be in a form that can be processedby an E-Book distiller for redisplaying the intermediate data structureon an E-book reader. In the event the intended use is to display anelectronic book, then an E-book distiller reads the intermediate datastructure and prepares it for display on a specific device such as aPDA, a computer graphical interface window, or any other graphicaldisplay device. Such processing of the intermediate data structure isnot limited to an E-Book distiller, but may accomplished be any methodor device for re-converting the intermediate data structure forredisplay on a selected display device.

[0043] In various exemplary embodiments of this invention, theintermediate data structure may be expressed in a variety of formatssuch as, for example, Open E-book XML, Adobe™ PDF 1.4 or later, HTMLand/or XHTML, as well as other useful formats that are now available ormay be developed in the future. In various exemplary embodiments of thisinvention, the intermediate data structure may contain tags, such asthose used in SGML and XML.

[0044] In various exemplary embodiments, in step S190, the segmentedimage elements are compressed into a smaller number of prototype images,so that each incoming element may be replaced by a prototype that isvisually similar to, or perhaps indistinguishable from the imageelements. This is an instance of ‘token-based’ compression where thetokens are the image elements. Therefore, if the image elements arewords, then the tokens are words. Alternatively, it may be advantageousto cut the image elements into smaller images corresponding exactly orapproximately with individual characters since there are fewer distinctcharacters than words in some languages. Compressing the segmented imageelements may further include writing a set, or dictionary, ofrepresentative compressed image tokens, and a list of references intothe representative compressed image tokens. Each reference represents anoriginal image element labeled with its position relative to thebaseline.

[0045] In various exemplary embodiments of this invention, the non-textimage areas, compressed non-text image areas, the set of representativecompressed image tokens, the segmented image elements and/or the layoutcharacteristics are synthesized in step S200 into an intermediate datastructure. However, in various exemplary embodiments of this invention,non-text area images may optionally first be compressed in step S190,for file compression, before being synthesized in step S200 forintegration into the intermediate data structure. Additionally, invarious exemplary embodiments of this invention, the segmented imageelements may be optionally compressed in step S190 before beingsynthesized in step S200 for integration into the intermediate datastructure. Determining whether to compress the non-text image areas andthe segmented image elements may be dependent on file size or other userspecific parameters. If the intermediate data structure does not includecompressed data, then the intermediate data structure may be representedas XHTML, for example.

[0046] In various exemplary embodiments of this invention, theintermediate data structure may also contain a tagged list containingreferences to every textual and non-image element that are proximate toor references by textual image element as well as layout characteristicssuch as indentation, hyphenation, spacing, and the like. In addition tothis list, a set of representative compressed image tokens can bewritten to a separate but intimately associated image element database.The intermediate data structure contains all the information required tosupport the reflowing and the reconstruction of the image elements.

[0047]FIG. 4 is a block diagram of one exemplary embodiment of adocument deconstruction and redisplay system 400 according to thisinvention. As shown in FIG. 4, one or more user input devices 480 areconnected over one or more links 482 to an input/output interface 410.Additionally, a data source 500 is connected over a link 502 to theinput/output interface 410. A data sink 600 is also connected to theinput/output interface 410 through a link 602.

[0048] Each of the links 482, 502, 602 can be implemented using anyknown or later developed device or system for connecting the one or moreuser input devices 480, the data source 500 and the data sink 600,respectively, to the document layout deconstruction and redisplay system400, including a direct cable connection, a connection over a wide areanetwork or a local area network, a connection over an intranet, aconnection over the Internet, or a connection over any other distributedprocessing network or system. In general, each of the links 482, 502,602 can be any known or later developed connection system or structureusable to connect the one or more user input devices 480, the datasource 500 and the data sink 600, respectively, to the document layoutdeconstruction and redisplay system 400.

[0049] The input/output interface 410 inputs data from the data source500 and/or the one or more user input devices 480 and outputs data tothe data sink 600 via the link 602. The input/output interface 410 alsoprovides the received data to one or more of the controller 420, thememory 430, a deconstructing circuit, routine or application 440, asynthesizing circuit, routine or application 450, a distilling circuit,routine or application 460, and/or a display 490. The input/outputinterface 410 receives data from one or more of the controller 420, thememory 430, the deconstructing circuit, routine or application 440, thesynthesizing circuit, routine or application 450, and/or the distillingcircuit, routine or application 460.

[0050] The memory 430 stores data received from the deconstructingcircuit, routine or application 440, synthesizing circuit, routine orapplication 450, the distilling circuit, routine or application 460,and/or the input/output interface 410. For example, the original data,the deconstructed data, the synthesized data, and/or the distilled data,may be stored in the memory 430. The memory can also store one or morecontrol routines used by the controller 420 to operate the documentlayout deconstruction and redisplay system 400.

[0051] The memory 430 can be implemented using any appropriatecombination of alterable, volatile or non-volatile memory ornon-alterable, or fixed, memory. The alterable memory, whether volatileor non-volatile, can be implemented using any one or more of static ordynamic RAM, a floppy disk and disk drive, a writable or re-writeableoptical disk and disk drive, a hard drive, flash memory or the like.Similarly, the non-alterable or fixed memory can be implemented usingany one or more of ROM, PROM, EPROM, EEPROM, an optical ROM disk, suchas a CD-ROM or DVD-ROM disk, and disk drive or the like.

[0052] It should be understood that each of the circuits or routinesshown in FIG. 4 can be implemented as portions of a suitably programmedgeneral purpose computer. Alternatively, each of the circuits orroutines shown in FIG. 4 can be implemented as physically distincthardware circuits within an ASIC, or using a FPGA, a PDL, a PLA or aPAL, or using discrete logic elements or discrete circuit elements. Theparticular form each of the circuits or routines shown in FIG. 4 willtake is a design choice and will be obvious and predicable to thoseskilled in the art.

[0053] In operation, the data source 500 outputs a set of original data,i.e., input document, scanned document, or the like, over the link 502to the input/output interface 410. Similarly, the user input device 480can be used to input one or more of a set of newly created originaldata, scanned data, or the like, over the link 482 to the input/outputinterface 410. The input/output interface 410 directs the received setof data to the memory 430 under the control of the controller 420.However, it should be appreciated that either or both of these sets ofdata could have been previously input into the document layoutdeconstruction and redisplay system 400.

[0054] An input document is input into the deconstructing circuit,routine or application 440 under control of the controller 420. Thedeconstructing circuit, routine or application 440 reads image files andlocates and isolates text area images and non-text area images. Non-textarea images are then sent to the synthesizing circuit, routine orapplication 450 under control of the controller 420 for synthesizing thedata into an intermediate data structure. Non-text images may optionallybe compressed prior to being synthesized at the synthesizing circuit,routine or application 450.

[0055] The deconstructing circuit, routine or application 440 reads theset of isolated images text area images and locates and isolates textline regions and detects the layout properties of the text line regions.The layout properties are sent to the synthesizing circuit, routine orapplication 450 under the control of the controller 420. The text lineregions are further processed by the deconstructing circuit, routine orapplication 440 into a set of segmented image elements with theirbaseline relative portions and then sent to the synthesizing circuit orroutine 450 under control of the controller 420 for synthesizing into anintermediate data structure. The deconstructing circuit, routine orapplication 440 may also compress the segmented image elements withtheir baseline relative portions into token-based image elements beforebeing sent to the synthesizing circuit, routine or application 450 undercontrol of the controller 420 for synthesizing into an intermediate datastructure.

[0056] It should be appreciated that the deconstructing circuit, routineor application 440 and the synthesizing circuit, routine or application450 can use any known or later-developed encoding scheme, to deconstructand synthesize the data to be converted into an intermediate datastructure that may then be distilled by the distilling circuit, routineor application 460 for display on the display device 490.

[0057] The synthesizing circuit, routine or application 450 synthesizesthe non-text area images and compressed non-text area image elements,the set of representative compressed image tokens; the segmented imageelements and the layout characteristics, and transcribes the data intoan intermediate data structure. The intermediate data structure is sentto the memory 430 under the control of the controller 430 for storage.

[0058] Upon request by a user of the input document, the distillingcircuit, routine or application 460 converts the intermediate datastructure into a format usable by the display 490. The distillingcircuit, routine or application 460, under control of the controller 420and the input output interface 410, will output the convertedintermediate data structure to the user's device for display.

[0059] It should be appreciated that the distilling circuit, routine ordisplay 460 can use any known or later-developed encoding scheme,including, but not limited to, those disclosed in this application, toconvert the intermediate data structure into a device specific formatusable for redisplay on an arbitrarily sized display.

[0060] In various exemplary embodiments, the systems and methods of thisinvention also relate to the use of special non-image markers, otherthan tags attached to particular image elements, to infer the functionsand properties of all the image elements from their relative positionswith respect to the markers within the intermediate data structure.

[0061] While this invention has been described in conjunction with theexemplary embodiments outlined above, it is evident that manyalternatives, modifications and variations will be apparent to thoseskilled in the art. Accordingly, the exemplary embodiments of theinvention, as set forth above, are intended to be illustrative, notlimiting. Various changes may be made to the invention without departingfrom the spirit and scope thereof.

What is claimed is:
 1. A method of converting a document in a page-imageformat into a form suitable for an arbitrarily sized display,comprising: deconstructing a document in a page image format;synthesizing the deconstructed document into an intermediate datastructure; and distilling the intermediate data structure for redisplayin a format usable for an arbitrarily sized display.
 2. The method ofclaim 1, wherein deconstructing a document in a page image formatincludes: identifying text image areas and non-text image areas of thedocument; locating and isolating text image areas and non-text imageareas; processing the isolated text image areas and non-text image areasinto text line regions and layout properties; processing located textline regions into segmented image elements; and locating and labelingsegmented image elements.
 3. The method of claim 2, whereindeconstructing a document in a page image into the set of segmentedimage elements includes at least one of physical segmentation of dataand logical segmentation of data.
 4. The method of claim 2, wherein theset of segmented image elements comprises at least one of blocks, lines,words, characters of text, groups of characters, and groups of non-textcharacters.
 5. The method of claim 1, wherein synthesizing includesconverting non-text image areas, layout properties and segmented imageareas into the intermediate data structure.
 6. The method of claim 2,wherein synthesizing the set of segmented image elements into anintermediate data structure includes integrating at least one ofbitmapped images in an intelligible display layout and links tonon-textual elements.
 7. The method of claim 6, wherein the bitmappedimages are images of words in reading order.
 8. The method of claim 1,wherein the intermediate data structure is stored in a storage device.9. The method of claim 1, wherein distilling the intermediate datastructure for redisplay in a format usable for an arbitrarily sizeddisplay, includes redisplaying the document in human readable format 10.The method of claim 1, wherein distilling the intermediate datastructure for redisplay in a format usable for an arbitrarily sizeddisplay, includes redisplaying the document in at least one of anelectronic book format, Internet browsable format and a print format.11. The method of claim 1, wherein distilling the intermediate datastructure includes converting the stored intermediate data structureinto a device specific display format for display.
 12. The method ofclaim 1, wherein the intermediate data structure is adaptable to atleast one of display screen size, page size, resolution, contrast, colorand geometry, at the time of display.
 13. The method of claim 1, whereinthe intermediate data structure is adaptability supported by at leastone of repagination of text, reflowing of text, logical links of text toassociated text and non-textual content.
 14. A method of converting adocument in a page-image format into a form suitable for an arbitrarilysized display, comprising: analyzing page layout; converting a sequenceof page images into a sequence of document element images captured in atagged format; and re-converting the tagged format into at least one ofan electronic book format, an Internet browsable format that can acceptimages and a print format.
 15. The method of claim 14, wherein thetagged format preserves at least one of reading order and logical pagelayout properties.
 16. A system of converting a document in a page-imageformat into a form suitable for an arbitrarily sized display,comprising: an input/output device; a controller; a deconstructingcircuit, routine or application that deconstructs a document; asynthesizing circuit, routine or application that synthesizes thedeconstructed document into an intermediate data structure; a distillingcircuit, routine or application that distills the intermediate datastructure for redisplay in a format usable for an arbitrarily sizeddisplay; a memory.
 17. The system of claim 16, wherein: thedeconstructing circuit, routine or application deconstructs the documentin a page image format into non-text image areas, layout properties, anda set of segmented image elements; the synthesizing circuit, routine orapplication synthesizes the non-text image areas, the layout properties,and the set of segmented image elements into an intermediate datastructure; and the distilling circuit, routine or application distillsthe intermediate data structure for redisplay in a format usable for anarbitrarily sized display.
 18. The system of claim 17, wherein thedeconstructing circuit, routine or application deconstructs the documentin a page image format into the set of segmented image elements thatincludes at least one of physical segmentation of data and logicalsegmentation of data.
 19. The system of claim 17, wherein theintermediate data structure includes at least one of bitmapped images inan intelligible display layout and links to non-textual elements. 20.The system of claim 19, wherein the bitmapped images are images of wordsin reading order.
 21. The system of claim 16, wherein the memory storesat least one of the document in page image format, the deconstructeddocument, the intermediate data structure and the distilled document.22. The system of claim 16, wherein the distilling circuit, routine orapplication distills the intermediate data structure for redisplay ofthe document in a format usable for an arbitrarily sized displayincludes redisplaying the document in at least one of an electronic bookformat, Internet browsable format, and a print format.
 23. The system ofclaim 16, wherein the distilling circuit, routine or applicationconverts the stored intermediate data structure into a device specificdisplay format for display.
 24. The system of claim 16, wherein theintermediate data structure is adaptable to at least one of displayscreen size, paper size, resolution, contrast, color and geometry, atthe time of display.
 25. The system of claim 16, wherein theintermediate data structure is adaptability supported by at least one ofrepagination of text, reflowing of text, logical links of text toassociated text and non-textual content.
 26. The system of claim 16,wherein the deconstructing circuit, routine or application analyzes pagelayout and converts a sequence of page images into a sequence ofdocument element images captured in a tagged format; and the distillingcircuit, routine or application converts the tagged format into at leastone of an electronic book format, an Internet browsable format that canaccept images and a print format.
 27. The system of claim 26, whereinthe tagged format preserves at least one of reading order and logicalpage layout properties.
 28. The system of claim 26, wherein thedeconstructing routine includes a segmentation algorithm and abackground structure analyzer.